AITopics | principal component principal component

Collaborating Authors

principal component principal component

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

Islam, Chashi Mahiul, Salman, Shaeke, Shams, Montasir, Liu, Xiuwen, Kumar, Piyush

arXiv.org Artificial IntelligenceJul-10-2024

Building on the unprecedented capabilities of large language models for command understanding and zero-shot recognition of multi-modal vision-language transformers, visual language navigation (VLN) has emerged as an effective way to address multiple fundamental challenges toward a natural language interface to robot navigation. However, such vision-language models are inherently vulnerable due to the lack of semantic meaning of the underlying embedding space. Using a recently developed gradient based optimization procedure, we demonstrate that images can be modified imperceptibly to match the representation of totally different images and unrelated texts for a vision-language model. Building on this, we develop algorithms that can adversarially modify a minimal number of images so that the robot will follow a route of choice for commands that require a number of landmarks. We demonstrate that experimentally using a recently proposed VLN system; for a given navigation command, a robot can be made to follow drastically different routes. We also develop an efficient algorithm to detect such malicious modifications reliably based on the fact that the adversarially modified images have much higher sensitivity to added Gaussian noise than the original images.

landmark, node, representation, (16 more...)

arXiv.org Artificial Intelligence

2407.07392

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Leon County > Tallahassee (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models

Salman, Shaeke, Shams, Md Montasir Bin, Liu, Xiuwen

arXiv.org Artificial IntelligenceJul-1-2024

Utilizing a shared embedding space, emerging multimodal models exhibit unprecedented zero-shot capabilities. However, the shared embedding space could lead to new vulnerabilities if different modalities can be misaligned. In this paper, we extend and utilize a recently developed effective gradient-based procedure that allows us to match the embedding of a given text by minimally modifying an image. Using the procedure, we show that we can align the embeddings of distinguishable texts to any image through unnoticeable adversarial attacks in joint image-text models, revealing that semantically unrelated images can have embeddings of identical texts and at the same time visually indistinguishable images can be matched to the embeddings of very different texts. Our technique achieves 100\% success rate when it is applied to text datasets and images from multiple sources. Without overcoming the vulnerability, multimodal models cannot robustly align inputs from different modalities in a semantically meaningful way. \textbf{Warning: the text data used in this paper are toxic in nature and may be offensive to some readers.}

dataset, principal component principal component, representation, (15 more...)

arXiv.org Artificial Intelligence

2407.01157

Country: North America > United States > Florida > Hillsborough County > University (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.34)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models

Salman, Shaeke, Shams, Md Montasir Bin, Liu, Xiuwen, Zhu, Lingjiong

arXiv.org Artificial IntelligenceFeb-13-2024

Transformer-based models have dominated natural language processing and other areas in the last few years due to their superior (zero-shot) performance on benchmark datasets. However, these models are poorly understood due to their complexity and size. While probing-based methods are widely used to understand specific properties, the structures of the representation space are not systematically characterized; consequently, it is unclear how such models generalize and overgeneralize to new inputs beyond datasets. In this paper, based on a new gradient descent optimization method, we are able to explore the embedding space of a commonly used vision-language model. Using the Imagenette dataset, we show that while the model achieves over 99\% zero-shot classification performance, it fails systematic evaluations completely. Using a linear approximation, we provide a framework to explain the striking differences. We have also obtained similar results using a different model to support that our results are applicable to other transformer models with continuous inputs. We also propose a robust way to detect the modified images.

dataset, principal component principal component, representation, (13 more...)

arXiv.org Artificial Intelligence

2402.08473

Country: North America > United States > Florida > Leon County > Tallahassee (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Transportation > Ground > Road (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intriguing Equivalence Structures of the Embedding Space of Vision Transformers

Salman, Shaeke, Shams, Md Montasir Bin, Liu, Xiuwen

arXiv.org Artificial IntelligenceJan-27-2024

Pre-trained large foundation models play a central role in the recent surge of artificial intelligence, resulting in fine-tuned models with remarkable abilities when measured on benchmark datasets, standard exams, and applications. Due to their inherent complexity, these models are not well understood. While small adversarial inputs to such models are well known, the structures of the representation space are not well characterized despite their fundamental importance. In this paper, using the vision transformers as an example due to the continuous nature of their input space, we show via analyses and systematic experiments that the representation space consists of large piecewise linear subspaces where there exist very different inputs sharing the same representations, and at the same time, local normal spaces where there are visually indistinguishable inputs having very different representations. The empirical results are further verified using the local directional estimations of the Lipschitz constants of the underlying models. Consequently, the resulting representations change the results of downstream models, and such models are subject to overgeneralization and with limited semantically meaningful generalization capability.

agama lizard, principal component principal component, wombat, (12 more...)

arXiv.org Artificial Intelligence

2401.15568

Country: North America > United States > Florida > Hillsborough County > University (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback